A New Way to Visual Representation and Learning
نویسندگان
چکیده
We explain some limitations of regularization theory of early vision and formulate visual representation and learning as statistical mechanics of surfaces with defects. In this new paradigm, pinning energy consists of a set of local oriented regularization fields. To reconstruct a D dimensional surface, known data, generally a set of patches of 0, I,. . . , D I dimensions, are taken as defects to "pin" the surface. Each realization of the surface pinned by the known data contributes a Boltzman weight and ensemble average over all realizations gives the reconstructed surface. In 2D, we present a neural network dynamics to approximate the ensemble average. The dynamics displays recovering ID shapes from local pinning bars, collinear grouping, edge and region filling-in, illusory figure perception and perception of apparent brightness as a kind of dynamic phase transitions. 1 Regularization Theory and Its Limitations Visual representation and learning can be seen as hypersurface reconstruction[l]. Regularization theory of early vision and the induced architecture, radial basis function (RBF) networks have become an important paradigm. Due to its mathematical simplicity and biological supports, RBF networks have been suggested as building blocks of the brain and used for object recognition and motion control[2]. However, regularization theory and the induced RBF architecture suffer from the following shortcomings I Quadratic functional are at best approximations to many cases, 2 The inputs are supposed to be un-correlated, sparse points. However, in early vision, the input data are usually correlated, for example, a segment of contrast edge, a surface patch of constant curvature or intensity. The usual firnctionals are not good approximations in these cases because they don't use the correlative information. 3 Linear superposition of RBFs contrasts vividly with massively connections in the brain cortex. Representing in a correlated way is the primary way used by the brain. For example, cells in the primary visual cortex respond most to simple bars of certain orientations and cells at higher hierarchy of visual cortex can represent more complex shapes through integrating inputs from lower hierarchy[3]. 2 Representation and Learning as Statistical Mechanics of Surfaces with Defects Visual representation and learning can be generally seen as statistical mechanics of surfaces with defects(SMSD)[4]. Take 2D surface reconstruction as an example. Continuous surfaces ( D = 2 ) can be seen as manifolds embedded in d = D + I = 3 dimensional space. The known data can be seen as pinning centers used to 'pin' the wandering surfaces in d = D + I = 3 space. The effective energy functional is In (I), f is the surface. K l , K2 are regularization constants. V is differential operator. dp is integral measure. Epin is pinning energy or kernel energy. The last two terms are from stretching and bending modes. In usual regularization theory, the pinning centers are We do not have general solutions to Eq.(l)-(5). At un-correlated points and Epin = Oi-/Y ' The present, we suggest to use some powerful algorithms in partition function associated with (1) is statistical physics to solve Eqs.(3), (4). Here, we design a neural network dynamics involving LOWS for shape representation and visual learning. The dynamics can be z = c =P(P H ( ~ ) ) (2) seen as approximation to the new paradigm at very low noise level. For illustration, we present the dynamics in 2D for 1D shape represen-tation, perceptual grouping, Summation is over all possible surface configurations. ill us or^ figure perception and perception of apparent is a constant representing noise level. The brightness. In these cases, the effective pinnings we are reconstructed surface is then ensemble average interested are simple bars. Consider a 2D array of neurons, the system is described by the following equations (3) MI h = + ~ l + g ~ ( V I)T+g2(v2 1 ) 2 + ~ R (6) 2 Mean field approximation recovers the solution in standard regularization theory R = A I I (r)((nlev)~Y + A U ( ~ ) ( ( ~ ~ ~ V ) I Y Intuitively, the best pinning to pin a D dimensional surface is a set of D dimensional patches. Here, we In (6), h is energy per site. I E [o, I ] is state variable. suggest the pinning energy as R is regularization term at each pinning bar and the summation is over all pinning bars. Each R contains EPa = 1 ( f i f f 4 + %)1 ( ( u r . v ) f y d p + four terms as in (7). K is applied field. V is a j u l v j p u 2 v p ( 5 ) differential operator. nl is the tangential direction of 2 2 pinning bars. n 2 is the normal direction of pinning A 22 (r) + -1 ( 6 2 . ~ ) ~ f j d P bars. Summation in (8) is over specific receptive fields. 2 F(.) is characteristic function. In this paper, for simplicity, we select receptive field and characteristic U ] is the normal of the pinning patch. u21ul , and has function as two orthogonal unit vectors. r is the distance from the pinning center in the local coordinate(ul, u2). M 2 l h(r)2 0 , r -+ co, h(r)+ 0 is local oriented regularization field (LOW). We choose the condition F(M)= -1, M 5 1 hll(r)> A 2 / ( r ) h12(r)> k22 (r) to penalize deviation ( : l < M < l from the tangential plane of the patch. [ J , ~ < l i j l < ~ i = j Jrj. = 3 Local Orientedly Regularized Neural 1 , otherwise Networks (LORNN) I q is the size of receptive fields. We adopt the following simple learning dynamics By (12), all connection weights in each neuron's receptive field update in the same way. Pinning bars induce an ensemble of oriented Gaussian(0G) or oriented exponential(0E) distribution. An OG is X is along the tangential direction of pinning bars, Y lies in the normal direction of pinning bars. = nI/n2 > 1 is orientation coefficient. At every site, the input is nonlinear superposition of an ensemble of OGs. We only consider nonlinear superposition of two OGs and suggest 11+ 12, 11 < e l 0' I 2 < e l 11+12+~(11+12) 81111%2, 01s12se2 11 + 12 + N 1112, otherwise (14) 11, I2 are from two OGs at a site. P, N are two constants. el, e2 are two thresholds. There is constraint between P, N , 2 8 2 P = N e 2 e 2 . Note that 12) is not Continuous across these regions. LORF can be of any form satisfying h ( r ) 2 0; r + m, h(r)+ 0 . Here, we choose OG hii (0) (i, j = I, 2) is amplitude of LORF. 011, 01 are the sizes of OG in the tangential and normal directions of pinning bars respectively. rll, r l are the distances from the pinning center along the tangential and normal direction of pinning bars respectively. 4 Collinear Grouping, Filling-in and Perceptual of Apparent Brightness as Dynamics Phase Transitions LORNN displays recovering I D shapes from local pinning bars, collinear grouping, edge and region filling-in, illusory figure perception and perception of apparent brightness as a kind of dynamic phase transitions. When some parameters pass some critical values, ID shapes are recovered, collinear grouping, edge and region filling-in, illusory figures and perception of apparent brightness emerge(See Table 1). There are finite energy jumps between up and below the critical values, so these phase transitions are of first order. These behaviors are definitely different from usual diffusion or reaction-diffusion, ART models or RBF models. Gs: Gaussians. Es: exponentials. CEs: Contrast Edges. bc is critical length( distance, gap, or radius). Jc is critical initial connection weight. Kc is critical applied field. Fig.1 present some examples. In all these examples, we set P = N = 0. Table I
منابع مشابه
The Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning
In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...
متن کاملComparative Effect of Visual and Auditory Teaching Techniques on Retention of Word Stress patterns: A Case Study of English as a Foreign Language Curriculum in Iran
This study aimed at investigating the effect of visual (Cuisenaire Rods) and auditory nonsensical monosyllables using Pratt speech processing software as teaching techniques on retention of word stress. To this end, 60 high school participants made the two experimental groups of the study each having 30 students on the basis of their proficiency scores on KET (Key English Test). In one experime...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملDynamic Assessment( DA) and Evaluation of Problem-solving Skills in Childeren
Introduction: The term dynamic assessment (DA) refers to an assessment, by an active teaching process, of a child's perception, learning, thinking, and problem solving. The process is aimed at modifying an individual's cognitive functioning and observing subsequent changes in learning and problem-solving patterns within the testing situation. DA has been advocated as an alternative and/or sup...
متن کاملThe Impact of Rote Learning on Vocabulary Learning: The Case of Iranian EFL Learners with Visual and Auditory Learning Styles
This study aimed to explore the effect of rote learning (word list learning) on boosting visual and auditory learners' vocabulary retention. To this end, the Oxford Placement Test (2007) was administered, in order to identify the learners' proficiency level. Then 31 subjects who were studying at the Iran Language Institute (ILI) in Bushehr, Iran, and belonged to available sampling were singled ...
متن کاملEvaluating the Success of the Visual Learners in Vocabulary Learning through Word List versus Sentence Making Approaches
Thisstudy sought to evaluate the learners' achievements with the visual learning style when exposed to the sentence making and word list approaches. On that account, 45 basic level participants who studied at the Iran Language Institute (ILI), Bushehr, took part in this research study. At the outset, the learners were given Barsch learning style inventory (1991) to determine the learners' learn...
متن کامل